NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs

Kil, Jihyung; Mai, Zheda; Lee, Justin; Chowdhury, Arpita; Wang, Zihe; Cheng, Kerrie; Wang, Lemeng; Liu, Ye; Chao, Wei-Lun (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))

The ability to compare objects, scenes, or situations is crucial for effective decision-making and problem-solving in everyday life. For instance, comparing the freshness of apples enables better choices during grocery shopping, while comparing sofa designs helps optimize the aesthetics of our living space. Despite its significance, the comparative capability is largely unexplored in artificial general intelligence (AGI). In this paper, we introduce MLLM-COMPBENCH, a benchmark designed to evaluate the comparative reasoning capability of multimodal large language models (MLLMs). MLLM-COMPBENCH mines and pairs images through visually oriented questions covering eight dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. We curate a collection of around 40K image pairs using metadata from diverse vision datasets and CLIP similarity scores. These image pairs span a broad array of visual domains, including animals, fashion, sports, and both outdoor and indoor scenes. The questions are carefully crafted to discern relative characteristics between two images and are labeled by human annotators for accuracy and relevance. We use MLLM-COMPBENCH to evaluate recent MLLMs, including GPT-4V(ision), Gemini-Pro, and LLaVA-1.6. Our results reveal notable shortcomings in their comparative abilities. We believe MLLM-COMPBENCH not only sheds light on these limitations but also establishes a solid foundation for future enhancements in the comparative capability of MLLMs.
more » « less
Full Text Available
COMPBENCH: A Comparative Reasoning Benchmark for Multimodal LLMs

Kil, Jihyung; Mai, Zheda; Lee, Justin; Wang, Zihe; Cheng, Kerrie; Wang, Lemeng; Liu, Ye; Chowdhury, Arpita; Chao, Wei-Lun (December 2024, NeurIPS)

Full Text Available
RAPTURE: a Remotely Accessible Platform of Testbeds for UAS Research and Education

https://doi.org/10.2514/6.2024-3569

Lee, Justin S; Palmer, Nicholas D; Xie, Junfei; Wan, Yan; Lu, Kejie; Fu, Shengli (July 2024, American Institute of Aeronautics and Astronautics)

Full Text Available
The Importance of Prompt Tuning for Automated Neuron Explanations

Lee, Justin; Oikarinen, Tuomas; Chatha, Arjun; Chang, Keng-Chi; Chen, Yilan; Weng, Tsui-Wei (December 2023, NeurIPS 2023 Attrib workshop)

Recent advances have greatly increased the capabilities of large language models (LLMs), but our understanding of the models and their safety has not progressed as fast. In this paper we aim to understand LLMs deeper by studying their individual neurons. We build upon previous work showing large language models such as GPT-4 can be useful in explaining what each neuron in a language model does. Specifically, we analyze the effect of the prompt used to generate explanations and show that reformatting the explanation prompt in a more natural way can significantly improve neuron explanation quality and greatly reduce computational cost. We demonstrate the effects of our new prompts in three different ways, incorporating both automated and human evaluations.
more » « less
Using Deep Learning to Detect Islamophobia on Reddit

https://doi.org/10.32473/flairs.36.133324

Aldreabi, Esraa; Lee, Justin M.; Blackburn, Jeremy (May 2023, The International FLAIRS Conference Proceedings)

Islamophobia, a negative predilection towards the Muslim community, is present on social media platforms. In addition to causing harm to victims, it also hurts the reputation of social media platforms that claim to provide a safe online environment for all users. The volume of social media content is impossible to be manually reviewed, thus, it is important to find automated solutions to combat hate speech on social media platforms. Machine learning approaches have been used in the literature as a way to automate hate speech detection. In this paper, we use deep learning techniques to detect Islamophobia over Reddit and topic modeling to analyze the content and reveal topics from comments identified as Islamophobic. Some topics we identified include the Islamic dress code, religious practices, marriage, and politics. To detect Islamophobia, we used deep learning models. The highest performance was achieved with BERTbase+CNN, with an F1-Score of 0.92.
more » « less
Full Text Available
Polysulfides in Magnesium‐Sulfur Batteries

https://doi.org/10.1002/adma.202306239

Luo, Tongtong; Wang, Yang; Elander, Brooke; Goldstein, Michael; Mu, Yu; Wilkes, James; Fahrenbruch, Mikayla; Lee, Justin; Li, Tevin; Bao, Junwei Lucas; et al (February 2024, Advanced Materials)

Abstract Mg‐S batteries hold great promise as a potential alternative to Li‐based technologies. Their further development hinges on solving a few key challenges, including the lower capacity and poorer cycling performance when compared to Li counterparts. At the heart of the issues is the lack of knowledge on polysulfide chemical behaviors in the Mg‐S battery environment. In this Review, a comprehensive overview of the current understanding of polysulfide behaviors in Mg‐S batteries is provided. First, a systematic summary of experimental and computational techniques for polysulfide characterization is provided. Next, conversion pathways for Mg polysulfide species within the battery environment are discussed, highlighting the important role of polysulfide solubility in determining reaction kinetics and overall battery performance. The focus then shifts to the negative effects of polysulfide shuttling on Mg‐S batteries. The authors outline various strategies for achieving an optimal balance between polysulfide solubility and shuttling, including the use of electrolyte additives, polysulfide‐trapping materials, and dual‐functional catalysts. Based on the current understanding, the directions for further advancing knowledge of Mg polysulfide chemistry are identified, emphasizing the integration of experiment with computation as a powerful approach to accelerate the development of Mg‐S battery technology.
more » « less
Full Text Available
Comparison of bacterial suppression by phage cocktails, dual‐receptor generalists, and coevolutionarily trained phages

https://doi.org/10.1111/eva.13518

Borin, Joshua M.; Lee, Justin J.; Gerbino, Krista R.; Meyer, Justin R. (January 2023, Evolutionary Applications)

Full Text Available
The elephant in the room: attention to salient scene features increases with comedic expertise

https://doi.org/10.1007/s10339-022-01079-0

Amir, Ori; Utterback, Konrad J.; Lee, Justin; Lee, Kevin S.; Kwon, Suehyun; Carroll, Dave M.; Papoutsaki, Alexandra (May 2022, Cognitive Processing)

Full Text Available
The positive environmental impact of virtual isotretinoin management

https://doi.org/10.1111/pde.14600

Lee, Justin; Yousaf, Ahmed; Jenkins, Samantha; Zaki, Mohammed Tamim; Napier, Cecelia; Abdul‐Aziz, Omar I.; Zinn, Zachary (May 2021, Pediatric Dermatology)

Full Text Available
Lignin-Derived Non-Heme Iron and Manganese Complexes: Catalysts for the On-Demand Production of Chlorine Dioxide in Water under Mild Conditions

https://doi.org/10.1021/acs.inorgchem.0c02742

Champ, Tayyebeh B.; Jang, Jun H.; Lee, Justin L.; Wu, Guang; Reynolds, Michael A.; Abu-Omar, Mahdi M. (March 2021, Inorganic Chemistry)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records